Picture for Yuxiao Qu

Yuxiao Qu

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Add code
May 19, 2026
Viaarxiv icon

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

Add code
Apr 06, 2026
Viaarxiv icon

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Add code
Mar 12, 2026
Viaarxiv icon

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Add code
Feb 03, 2026
Viaarxiv icon

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration

Add code
Jan 26, 2026
Viaarxiv icon

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Add code
Oct 02, 2025
Figure 1 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 2 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 3 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 4 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Figure 1 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 2 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 3 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 4 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Viaarxiv icon

Harnessing Webpage UIs for Text-Rich Visual Understanding

Add code
Oct 17, 2024
Figure 1 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Figure 2 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Figure 3 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Figure 4 for Harnessing Webpage UIs for Text-Rich Visual Understanding
Viaarxiv icon

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Add code
Jul 26, 2024
Figure 1 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 2 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 3 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 4 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Viaarxiv icon

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

Add code
Oct 27, 2023
Figure 1 for Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Figure 2 for Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Figure 3 for Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Figure 4 for Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Viaarxiv icon